A joint ventricle and WMH segmentation from MRI for evaluation of healthy and pathological changes in the aging brain

Age-related changes in brain structure include atrophy of the brain parenchyma and white matter changes of presumed vascular origin. Enlargement of the ventricles may occur due to atrophy or impaired cerebrospinal fluid (CSF) circulation. The co-occurrence of these changes in neurodegenerative diseases and in aging brains often requires investigators to take both into account when studying the brain, however, automated segmentation of enlarged ventricles and white matter hyperintensities (WMHs) can be a challenging task. Here, we present a hybrid multi-atlas segmentation and convolutional autoencoder approach for joint ventricle parcellation and WMH segmentation from magnetic resonance images (MRIs). Our fully automated approach uses a convolutional autoencoder to generate a standardized image of grey matter, white matter, CSF, and WMHs, which, in conjunction with labels generated by a multi-atlas segmentation approach, is then fed into a convolutional neural network to parcellate the ventricular system. Hence, our approach does not depend on manually delineated training data for new data sets. The segmentation pipeline was validated on both healthy elderly subjects and subjects with normal pressure hydrocephalus using ground truth manual labels and compared with state-of-the-art segmentation methods. We then applied the method to a cohort of 2401 elderly brains to investigate associations of ventricle volume and WMH load with various demographics and clinical biomarkers, using a multiple regression model. Our results indicate that the ventricle volume and WMH load are both highly variable in a cohort of elderly subjects and there is an independent association between the two, which highlights the importance of taking both the possibility of enlarged ventricles and WMHs into account when studying the aging brain.

: Erroneous skull-stripping results from MONSTR that were removed from our training set. The figure shows T1-w images and the corresponding skull-stripping boundaries generated by MONSTR (red).
subset of the data at hand, which can be used for training. Alternatively, brainmasks can be manually corrected, however, this is a much more time consuming approach.

Preparation of training data and CNN architecture
The development and evaluation of the skull-stripping U-net was performed using brain MRIs from the AGES-Reykjavik data set (cf. Section 2.1. in the main text). The brainmasks used for supervised training of the skullstripping CNN were generated by the MONSTR method [4]. Brainmask atlases for MONSTR were created by manually delineating the brain in 6 subjects from our AGES-Reykjavik development set of 120 subjects. Manual inspection of 60 of the generated MONSTR brainmasks led to the exclusion of 13 masks due to skullstripping failures (see Figure 1); hence the remaining 47 masks were used for training. Our training set comprised the T1-w, T2-w, and FLAIR images and the corresponding brainmasks. The network architecture can be seen in Figure 2.

Training
The 47 training images were intensity normalized by dividing by the 99th percentile of the non-zero elements of the image and 80×80×80 voxel patches were extracted with a 40 voxel stride. A weighted categorical cross-entropy loss function was used. The weights were determined with class weights [8]. The network was trained for 200 epochs with a learning rate of 1 · 10 5 using the Adam optimizer [9] with Nesterov momentum [10], with β 1 = 0.9, β 2 = 0.999, schedule decay of 0.004, and a batch size of 5.

Evaluation
The evaluation of our skull-stripping method was twofold: First, we compared the results of our method to results generated by MONSTR on the development set of 120 subjects. Second, we compared the intracranial volumes (ICVs) of 2401 subjects on MRI scans that were acquired at two different time points (scans acquired 5 years apart on average). We visually inspected 9 slices of each of these 2401 subjects to detect failures and their causes. Figure 3 shows a histogram of the Dice dissimilarity (one minus the Dice similarity coefficient) between the 120 MONSTR brainmasks and the Unet brainmasks. Subjects with the lowest Dice dissimilarity between the U-net and MONSTR were selected one-by-one for visual comparison, until the visual differences between the two brain segmentations were negligible, resulting in 8 subjects. Figure 4 shows one slice from each of these 8 subjects that have the largest error.
One limitation to this evaluation strategy is that by comparing the overlap of the masks generated by the U-net to the masks from MONSTR we would not expect to find large values for Dice dissimilarity if both MONSTR and the U-net systematically fail in the same way. Therefore, we visually inspected 9 slices from each of the 2401 subjects with longitudinal MRI scans and found that the U-net was very robust, except in cases when: 1) One or more MRI sequences had registration errors (24 registration errors in total); and/or 2) there were visible skullstripping errors (9 cases of which 3 were caused by registration errors). These registration errors are marked on Figure 5, which compares the brainmask volumes at two timepoints. The figure shows that the predicted ICV is generally very consistent, with the exception of the few cases that had registration and/or skullstripping failures.